Bisimulation Metrics are Optimal Value Functions
نویسندگان
چکیده
Bisimulation is a notion of behavioural equivalence on the states of a transition system. Its definition has been extended to Markov decision processes, where it can be used to aggregate states. A bisimulation metric is a quantitative analog of bisimulation that measures how similar states are from a the perspective of long-term behavior. Bisimulation metrics have been used to establish approximation bounds for state aggregation and other forms of value function approximation. In this paper, we prove that a bisimulation metric defined on the state space of a Markov decision process is the optimal value function of an optimal coupling of two copies of the original model. We prove the result in the general case of continuous state spaces. This result has important implications in understanding the complexity of computing such metrics, and opens up the possibility of more efficient computational methods.
منابع مشابه
Metrics for Markov Decision Processes with Infinite State Spaces
We present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning tas...
متن کاملBisimulation Metrics for Continuous Markov Decision Processes
In recent years, various metrics have been developed for measuring the behavioural similarity of states in probabilistic transition systems [Desharnais et al., Proceedings of CONCUR, (1999), pp. 258-273, van Breugel and Worrell, Proceedings of ICALP, (2001), pp. 421-432]. In the context of finite Markov decision processes, we have built on these metrics to provide a robust quantitative analogue...
متن کاملKnowledge Transfer in Markov Decision Processes
Markov Decision Processes (MDPs) are an effective way to formulate many problems in Machine Learning. However, learning the optimal policy for an MDP can be a time-consuming process, especially when nothing is known about the policy to begin with. An alternative approach is to find a similar MDP, for which an optimal policy is known, and modify this policy as needed. We present a framework for ...
متن کاملAn approximation theory for discrete event and continuous time systems
Established system relationships for discrete systems, such as language inclusion, simulation, and bisimulation, require system observations to be identical. When interacting with the physical world, modeled by continuous or hybrid systems, exact relationships are restrictive and not robust. In this paper, we develop the first framework of system approximation that applies to both discrete and ...
متن کاملOptimal Stochastic Control in Continuous Time with Wiener Processes: General Results and Applications to Optimal Wildlife Management
We present a stochastic optimal control approach to wildlife management. The objective value is the present value of hunting and meat, reduced by the present value of the costs of plant damages and traffic accidents caused by the wildlife population. First, general optimal control functions and value functions are derived. Then, numerically specified optimal control functions and value func...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014